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Abstract: Exchanging too many messages for fault detection will cause not only a 
degradation of the network quality of service, but also represents a huge burden on the 
limited energy of sensors. Therefore, we propose an uncertainty-based distributed fault 
detection through aided judgment of neighbors for wireless sensor networks. The algorithm 
considers the serious influence of sensing measurement loss and therefore uses Markov 
decision processes for filling in missing data. Most important of all, fault misjudgments 
caused by uncertainty conditions are the main drawbacks of traditional distributed fault 
detection mechanisms. We draw on the experience of evidence fusion rules based on 
information entropy theory and the degree of disagreement function to increase the 
accuracy of fault detection. Simulation results demonstrate our algorithm can effectively 
reduce communication energy overhead due to message exchanges and provide a higher 
detection accuracy ratio. 

Keywords: fault detection; uncertainty; evidence fusion; data missing; information entropy 



1. Introduction 

Sensors can be rapidly deployed into large areas and perform monitoring tasks by autonomous 
wireless communication methods. In disaster prevention applications, for example, nodes detect and 
estimate environmental information, and then forecast when and where a natural calamity may occur. 
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Although users always hope that the network will provide excellent monitoring and gathering functions, 
it seems inevitable that sensors to experience faults caused by some extrinsic and intrinsic factors. 
Generally, a fault is an unexpected change in the network, which leads to measurement errors, system 
breakdown or communication failure. 

Faults are generally classified as crash, timing, omission, incorrect computation, fail breakdown, 
authenticated Byzantine, etc. [1]. From another point of view, crash faults are classified as 
communication faults, since under those conditions, a sensor can't communicate with others because it 
has a failure in its communication module or the link is down. Contrarily, all other faults are viewed as 
data faults, which means the faulty sensors can communicate with each other, but the sensed or transmitted 
data is not correct [2,3]. To avoid erroneous judgments due to faults, broken-down nodes should be 
detected and isolated from other functioning nodes. Fault detection should make an unambiguous 
decision about whether the behavior of a sensor deviates from other common measurements. 

Sensors always form a local view of the fault state of sensors by collecting measurements from their 
one-hop neighbors. Neighbor cooperation is one approach to fault detection, whereby a sensor uses 
neighbor measurements to decide its own fault state collaboratively [4-6]. This is demonstrated to be 
efficacious for fault information collection and diagnosis because it alleviates the overheads of sink 
nodes or base stations in order to avoid network bottlenecks. Accordingly, a novel challenge for fault 
detection in wireless sensor networks (WSNs) is how to reduce the energy consumption when exchanging 
messages is the main means of fault detection in the distributed environment. Moreover, the dynamic 
network topology and signal loss caused by long propagation delays and signal fading influence the 
efficiency of fault detection in some advanced medical care or battlefield response applications. 

In the majority voting algorithm based on neighbor cooperation detection, the normal measurements 
of sensors that are located close to each other are assumed to be spatially correlated, while the fault data 
are uncorrelated. The tendency state of a sensor is determined as possibly faulty (LF) or possibly good 
(LG) by comparing its own readings with those of its one-hop neighbors in a voting process. If the 
number of LG neighbors that have correlated readings is greater than or equal to half, then it is fault-free, 
otherwise it is deemed faulty. The weighted voting approach uses geographic distance or degree of trust 
as the deciding factor when calculating the sensor state, but these methods perform in WSNs better based 
on the hypothesis of higher average connectivity degree. Actually, sensors are usually deployed in a 
lower connectivity environment, in which exchanged readings are too few to make an appropriate 
comparison and decision (e.g., in Figure la frontier node only has one neighbor). Then the detection 
accuracy rate decreases as the fault rate increases. Moreover, the faults caused by attacks are unevenly 
distributed in the case of intrusion monitoring because the hostile signals without a fixed routing will 
randomly affect or tamper with readings. 

In this paper, we mainly focus on sensing faults other than communication faults. After analyzing the 
defects of traditional algorithms, we present an Uncertainty-based Distributed Fault Detection (uDFD) 
mechanism for wireless sensor networks. The main contributions of uDFD are as follows: 

(1) Propose the uncertainty-based distributed fault detection algorithm, which can avoid decreasing 
fault detection accuracy when the failure ratio becomes higher. In addition, the accuracy of fault 
detection remains at a high level regardless of a lower connectivity scene; 



Sensors 2014, 14 



1651 



(2) Data loss will influence the fault judgment because each sensor determines its own state step by step 
according to its neighbors' measurements. The paper represents a data forecast model based 
on a Markov decision processes for filling in lost data to provide reference data for others' 
state determinations; 

(3) We classify two types of sensors' tendency states: Possible Good (LG) and Undetermined {Un). 
LG nodes contribute to judge nodes' ultimate state. The Un nodes are both in an uncertainty 
status, so we must determine the ultimate status of an Un node. Here we design belief probability 
assignment {BP A) functions for different evidences that reflect the states of Un nodes. What's more, 
an evidence fusion rule based on information entropy theory is used to avoid evidence conflicts. 

Figure 1. Fault detection illustration. 




G G G F 

(a) (b) 

The rest of the paper is organized as follows: Section 2 describes some related works in the area of 
fault detection in WSNs. Section 3 introduces our Uncertainty-based Distributed Fault Detection 
algorithm (uDFD) and the concrete mechanisms involved. Section 4 depicts the simulation results with 
respect to typical fault detection algorithms like DFD and IDFD, and demonstrates our algorithm's 
efficiency and superiority. In Section 5, we conclude the paper. 

2. Related Works 

In this section, we briefly review related works in the area of distributed and centralized fault 
detection in WSNs. The authors in [4] proposed and evaluated a localized fault detection scheme (DFD) 
to identify faulty sensors. An improved DFD scheme was proposed by Jiang in [5]. Neighbors always 
exchange sensing measurements periodically, therefore a sensor judges its own state (as good or 
faulty) according to neighbors' values. A faulty identification algorithm reported in [7] is completely 
localized and requires lower computational overhead, and it can easily be scaled to large sensor 
networks. In the algorithm, the reading of a sensor is compared with its neighbors' median readings. If 
the difference is large or large but negative, the sensor is deemed as faulty. If half of neighbors are faulty 
and the number of neighbors is even, the algorithm cannot detect faults. 

Krishnamachari and co-workers proposed in [8] a distributed solution for the canonical task of binary 
detection of interesting environmental events. They explicitly take into account the possibility of 
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measurement faults and develop a distributed Bayesian scheme for detecting and correcting faults. Each 
sensor node identifies its own status based on local comparisons of sensed data with some thresholds and 
dissemination of the test results [9]. Time redundancy is used to tolerate transient sensing and 
communication faults. To eliminate the delay involved in z time redundancy scheme, a sliding window is 
employed with some data storage for comparison with previous results. 

The MANNA scheme [10] creates a manager located externally to the WSN. It has a global vision of 
the network and can perform complex tasks that would not be possible inside the network. Management 
activities take place when sensor nodes are collecting and sending temperature data. Every node will 
check its energy level and send a message to the manager/agent whenever there is a state change. The 
manager can then obtain the coverage map and energy level of all sensors based upon the collected 
information. To detect node failures, the manager sends GET operations to retrieve the node state. 
Without hearing from the nodes, the manager will consult the energy map to check its residual energy. In 
this way, MANNA architecture is able to locate faulty sensor nodes. However, this approach requires an 
external manager to perform the centralized diagnosis and the communication between nodes and the 
manager is too expensive for WSNs. 

Tsang-Yi et al [11] proposed a distributed fault- tolerant decision fusion in the presence of sensor 
faults. The collaborative sensor fault detection (CSFD) scheme is proposed to eliminate unreliable local 
decisions. In this approach, the local sensors send their decisions sequentially to a fusion center. This 
scheme establishes an upper bound on the fusion error probability based on a pre-designed fusion rule. 
This upper bound assumes identical local decision rules and fault-free environments. They proposed a 
criterion to search the faulty sensor nodes which is based on this error boundary. Once the fusion center 
identifies the faulty sensor nodes, all corresponding local decisions are removed from the computation of 
the likelihood ratios that are adopted to make the final decision. This approach considers crash and 
incorrect computation faults. 

In [12], a taxonomy for classification of faults in sensor networks and the first on-line model-based 
testing technique are introduced. The technique considers the impact of readings of a particular sensor on 
the consistency of multi-sensor fusion. A sensor is most likely to be faulty if its elimination significantly 
improves the consistency of the results. A way to distinguish random noise is to run a maximum 
likelihood or Bayesian approach on the multi- sensor fusion measurements. If the accuracy of final 
results of multisensory fusion improves after running these procedures, random noise should exist. To 
get a consistent mapping of the sensed phenomena, different sensors' measurements need to be 
combined in a model. This cross-validation-based technique can be applied to a broad set of fault 
models. It is generic and can be applied to an arbitrary system of sensors that use an arbitrary type of data 
fusion. However, this technique is centralized. Sensor node information must be collected and sent to the 
base station to conduct the on-line fault detection. 

Miao et al [13] presented an online lightweight failure detection scheme named Agnostic Diagnosis 
(AD). This approach is motivated by the fact that the system metrics of sensors (e.g., radio-on time, 
number of packets transmitted) usually exhibit certain correlation patterns. This approach collects 
22 types of metrics that are classified into four categories: (1) timing metrics (e.g., 
RadioOnTimeCounter). They denote the accumulative radio-on time; (2) traffic metrics (e.g., 
TransmitCounter). They record the accumulative number of packets transmitted by a sensor node; (3) task 
metrics (e.g., TaskExecCounter). This is the accumulative number of tasks executed; (4) other metrics 
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such as Parent Change Counter, which counts the number of parent changes. AD exploits the 
correlations between the metrics of each sensor using a correlation graph that describes the status of the 
sensor node. By mining through the periodically updated correlation graphs, abnormal correlations are 
detected in time. Specifically, in addition to predefined faults (i.e., with known types and symptoms), 
silent failures caused by Byzantine faults are considered. 

Exchanging too many messages for fault detection will cause not only a degradation of the network 
quality of service, but also a huge burden on the limited energy of sensors. Hence, we design an 
uncertainty-based distributed fault detection based on neighbor cooperation in WSNs. It adopts 
auto-correlated test results to describe different sensing states from day to day, and the information 
entropy-based D-S evidence theory will be introduced to deduce actual states for undetermined nodes. 

3. Uncertainty-Based Fault Detection Mechanism 

3.1. The DFD and IDFD Schemes and Their Drawbacks 

This section presents the DFD algorithm proposed by Chen [4] and IDFD algorithm described by 
Jiang [5] to give an overview of distributed fault detection, and then analyzes these algorithms' 
drawbacks. Chen [4] introduced a localized fault detection method by exchanging measures in WSNs. It 
is assumed that xi is the measurement of node /. We define d\j to represent the measured difference 

between node / and j at time t, while is measurement difference from time U to ti+i. 

d^=^i-^j (1) 

-dl =i^r -^n-i^i -^) (2) 

When \d\j | is less than or equal to a predefined threshold 6^ , we will consider a test result Cij is set to 

0, or else it continuously calculates |AJ^^^^|.If |aJ^^^^| > 0^ {0^ is also a predefined threshold), then 

Cij = 1, otherwise Cij = 0. Here the expression Cij = 1 means node / and node j are possibly in different 
states. Next, the tendency status (possibly a faulty LF or possibly a good LG) is determined according to 
following formula [14]: 

\lF ifj:c,^^\\N,\/2] 

j-^i (3) 
LG otherwise 



T=< 



where |~|A^, |] is the number of one-hop neighbors of node /. The formula states that a sensor is deemed 

to be possibly good only if there are less than \\N.\/ 2~| neighbors whose test results are 1. In order to 

process the second round test, each node needs to send its tendency state to its one-hop neighbors. In the 
DFD algorithm, in the end state the node Z/ is decided to be fault- free only if a difference / is greater 

than or equal to |~|A^.|/2~| , otherwise / is undetermined. Here 7 = (1 - c.. ) -y^c.. = y^X^~ 2fi/ ) 

(V/ G A^., Tj = LG). In order to promote identification efficiency for undetermined sensors, these nodes 

repeatedly check whether their neighbor's state is fault-free or not. If such a neighbor exists, then the 
sensor is faulty (fault- free) according to the test result 1(0) between them. A sensor may not decide its 
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own state because the states of neighbors are in conflict, e.g., Zj = Zk = GOOD. At the same time, Cji 7^ Ckh 
Then Z/ is GOOD if Ti = LG, or else Z, is FAULT. 

Jiang [5] considers the determinant condition V (1 - 2c. ) > fl M I / 2l in the DFD algorithm 

is too harsh and this will lead some normal nodes to be misdiagnosed as faulty, so the determinant 
condition for a normal node is amended as: 

T„-w'^iJ<W^^\-.^ (4) 

If there is no tendency status of a neighbor as LG, then the final determinant status is set as normal 
(faulty) based on Tt = LG (Ti = LF). Although this mechanism promotes the fault detection accuracy to a 
certain extent through simulation demonstration, it doesn't have a clear way to resolve conflicts or 
erroneous judgments as illustrated in Figure 1. 

In Figure la, it calculates cu = 0, cu = 0, and cu = 0 for node 1. Then Ti is set as LG according to 
Equation (3). In the same way, we get T2 = LF, T3 = LF, T4 = LF. Node 1 has no neighbor whose 
tendency status is LG, and then the final determinant status is set as normal based on the rule of Ti = LG. 
This is an obvious erroneous judgment. 

The tendency states in Figure lb are calculated as follows: Ti = LF, T2 = LG, T3 = LF, T4 = LG. For 
node 1, y c..=l^ ,^+0,^ and [1^/^1/2] =2/2 = 1. The node 1 is decided as 

' ^jGNi&Tj=LG V \T3=LG \Ts=LG ' \\ i\ \Tj=LG 

faulty according to Equation (4). Actually, node 1 is a normal sensor. Node 1 will make a mistake when 
the number of normal neighbors equals the number of faulty neighbors. The premise is that their initial 
detection tendency states are LG. 

By analyzing misjudgment conditions of traditional algorithms, a defect is that an indeterminacy 
occurs on the condition ' in Equation (4), and thus the node is not reducible to good or faulty. Another 
is that these algorithms ignore the effect of sensors' own measurements which are approximate at the 
same time on adjacent days (e.g., 8 June and 9 June). The analogous and historical readings of the same 
node contribute to determine the faulty state under vague conditions. 

Moreover, most distributed fault detection mechanisms assume that sensors have the ability to 
acquire every measurement and cooperatively judge the state of each other. When the sensor's 
communication module has a failure, but the acquisition module is active, the readings can't be 
perceived by the sensor. In a distributed collaborative process, nodes diagnose data faults based 
primarily on neighbors' data. Once a neighbor's data is missing, it will affect the accuracy of fault 
diagnosis, e.g., in Figure lb, node 4 can't determine its own status when node 1 has no data. 

3.2. Uncertainty-Based Distributed Fault Detection Algorithm 

In the paper, we mainly resolve the following problems: (1) data missing before exchanging readings; 
(2) misjudgments caused by indeterminacy conditions. The problem of missing data due to 
communication faults will affect the determination accuracy when comparing neighbors' measurements. 
To solve the data loss, a faulty sensing node should fill in the missing measurements to provide the 
reference. Secondly, the represented algorithm adopts the auto-correlated test results to describe the 
status of differences between different days. Finally, those undetermined appearances may occur in the 
above-mentioned section. The information entropy and the degree of disagreement function combined 
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in evidence fusion theory are improved accordingly to help to deduce their actual states. In addition, using 
information entropy in the evidence fusion can reduce evidence conflicts and increase detection accuracy. 

3.2.1. Definitions 

We list the notations in the uDFD algorithm as follows: 

• p\ Probability of fault of a sensor; 

• Ni\ A set of neighbors of node /; 

• xf'^ : Measurement value of node / at time t on day D\ 

• r|A^.n: Number of one-hop neighbors of node /; 




Formula (1); 

• M^^' : Measurement difference between node / and j from time t/ to t/+i on the same day 
according to Formula (2); 

• AJ^f : Measurement difference of node / at the same time t on different day; 

• Cif. Test result between node / and j, c.j g {0, 1} ; 

• Ti'. Tendency value of a sensor, T. g {LG, Un] ; 

• Zf. Determined detection status of a sensor, Z. g {GOOD, FAULT} ; 

• Oi, 62, 63: Predefined threshold values about 4' ' ' ' 

• Numi{{G])\ Number of good neighbors of node /; 

• Numi{[F})\ Number of faulty neighbors of node /. 

3.2.2. Fault Detection 

The main processes of the uDFD algorithm based on neighbor cooperation are summarized as 
follows. The key technology for solving the two problems is described in Sections 3.2.3 and 3.2.4. 

Stage 1: Each sensor acquires the readings from its own sensing module. If no data is acquired, then it 
fills up the missing data. After that, it exchanges the measurement at time t on day D with its neighbors 
and calculates the test result Cy (It's assumed that Qj = 0 at the initial time): 

1 : If \Mf/ 1 < ^3 , then set Q = 0; 

2: elseQ=l; 

3: end if 

4: If |4|>^i,thenQ = 1 

5: else if \d\. \<9^ && \Ad^'^ \>0^, then Q = 1 ; 

6: else Q = 0; 
7: end if 

8: Repeat the above steps until all of test results about neighbors are obtained. 
Stage 2: Node / generates the tendency value based on c.j(\/j) : 




dy. Measurement difference between node / and j at time t on the same day according to 
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10: elser;=f/n; 
11: end if 

12: Broadcast the tendency status if T, = LG. 

Stage 3: Calculate the determined status of LG nodes: 

13: Ifr, = LG&& (3/)7e{LG}; 

A^Mm(A^^.) + l 

2 j=LG 



14: 



7=LG 



< 



then Zi = Good; 



15 
16 
17 
18 
19 



else Z/ = Fault; 
end if 

else if Ti = LG && no any neighbor is LG, then T/ = Un; 
end if 

A LG node can determine its own status (good or faulty), and only good sensors broadcast 
their states in order to save transmission overheads. 



Stage 4: A node whose tendency status is Un determines the actual state by using entropy -based 
evidence combination mechanism: 



20: 
21: 

22 

23 
24 
25 



Node / (/ G {LF,Un}) receives the evidence of good neighbors. 

Combine the evidences generated by measurements by adopting information entropy-based 
evidence fusion, and acquire the combined BPA functions m* ({G}) , rn({F]) , and rn Q¥) ; 

Node / finds the node j which matches themin|~m^.({G} -m*({G}))~| ; 

if Cij = 1, then Z, = FAULT, else Z, = GOOD; 
end if 

Determined node broadcasts its status if it's a good sensor. 



Broadcasting not only uses up nodes' energy but also occupies the channel bandwidth, so the main 
method of saving energy consumption in our algorithm is that only particular states in different stages 
(LG and GOOD) are broadcast. In Step 12, only the node whose tendency status is equal to LG 
broadcasts the value. The reason is that only LG neighbors participate in final state determination in Step 14. 
Similarly, only good sensors broadcast their states in order to save energy transmission overhead. 

3.2.3. Missing Data Preprocessing Mechanism 

In the paper, we mainly focus on sensing faults rather than communication faults. When missing data 
occurs because of a sensing fault, it will affect the accuracy of fault diagnosis. This means Xf^'^ has 

been lost because the conmiunication module has failed, which subsequently influences the reference 
data for other sensors' faulty state determination. It is necessary for node / to fill in the missing data and 
send it to neighbors. In this section, we use a Markov decision processes based on neighbors' historical 
data to predict the current missing measurement values of node /. Relying the features of Markov theory 
which can reflect the influence of random factors and extension to the stochastic process which is 
dynamic and fluctuating is considered and we combine the historical data of node / with its neighbors' 
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historical data, and then form a fusion historical data vector, which can be adaptively adjusted according 
to the significance of neighbors' measurements. Therefore, the state transition matrix of Markov is 
adopted to predict the value and sign of the reading difference between two days. The steps for data 
missing preprocess preprocessing are as follows: 

Steps: 

(1) For each node j g , where Ni is the set of all the neighbors of node /, fetch the previous m 

historical measurements of node j, and these historical measurements correspond to an m dimensional 
vector V,-, that is V. = (X^'''' ,Xf-^^'' Xf''') ; 

(2) Calculate the reputation value Q for each neighbor of node /, that is for each node j g N. , we 

have Q =^"^ , where =—YtMf~'' -^r''\ • Note that for a different node /, node j has 

m 

different reputation values and a smaller value for /l. will increase the reputation value of node j; 

(3) Here we introduce Mahalanobis distance to evaluate the similarity distance between node / and its 
neighbors. Then the prediction results should keep Mahalanobis distance changes within a predefined 
threshold. For each node j g A^. , calculate the Mahalanobis 's distance D(y-,Vj^ between vectors Vi 

and V/, in order to evaluate the similarity of node / and all its neighbors. That is 
D(V, ,Vj) = ^j(V,-Vjf i:~\V,-Vj) , where S is the covariance matrix of Vi and V); 

(4) Assume that V)* is a fusion of the historical measurements of node / and all its neighbors, which 

is used in the Markov decision processes to predict the current measurement of node /. It is also an m 
dimensional vector and can be calculated as follows: 

y;=ax^.+^xx';!| 

In this data- fusion formula, the historical measurement vector Vj is weighted by the reputation value 
of node y, and the factors a and P{a + P-\) indicate to what extent a node trusts itself and neighbors. 
Here 6^ = ^^-0.5. 

(5) According to the result of fusion in Step 4, use Markov decision processes to predict the current 
measurement of node /, then we can get Xf'^ ; 

(6) For each node 7 g A^. , recalculate the Mahalanobis 's distance D'(VJ',VJ) between vectors 

and y;. That is D'(y;yj) = ^{V;-V]fi:'-\v;-V'^ , here V; and V] are (m + 1) dimensional 
vectors, and V; = (X j'-"^ X^''^^'^ • • s ^ 2' is the 

covariance matrix of V/ and V'. ; 

(7) If y/ G A^. , Z)'(V;/,yp - Z) (v;. ) < 6^ , where /9 is a predefined threshold, then there is no need to 
adjust the fusion factor a and the predicted value Xf'^ can be adopted. Otherwise, the predicted value 
Xf'^ increases the differences between node / and node 7, so the fusion factor a needs to be reduced 
appropriately, in order to decrease the proportion of neighbors' measurements in the calculation of V)* ; 

(8) If the fusion factor a has been adjusted in Step 7, then return to Step 4. Otherwise, this 
algorithm ends. 



C 



(5) 



Sensors 2014, 14 



7664 



In order to predict the missing data Xf'^ in Step 5 of the above algorithm, we draw on the experience 
of Markov decision processes [15,16]. Firstly, according to V^* which is calculated in Step 4 of the 
above algorithm, we can get the corresponding vector X. = (^x^~^' | ^^x^'"^^'' | , |) and it is 

an (m - 1) dimensional vector and can be considered as an independent and identical distributed 
Markov chain. 

Then, we classify the state of each component in vector X. by an average- standard deviation 
classification method. Assume that state s can be expressed as ^[min^, max^], where min^ and 
max^ indicate the lower bound and upper bound of state s. 

Then the sample average is: 



-| m 

-,ZK-^1 (6) 



The standard deviation is: 



I 1 JIL 

According to central-limit theorem [17], we divide the sliding interval of historical fault data 
into five states, that is E^=(^ju-3S,ju-S) , E^=[ju-S,ju-0.5S) , E^=[ju-0.5S, ju+0.5S) , 

£4 = [// + jU + S) , and E^ =[ju + S,jU + 3S) . The state of each component in the difference vector 

X. depending on which sliding interval it belongs to. 

The transition probability matrix P^^^ can be calculated as follows. Assume that M^^ indicates the 

sample numbers that state Es transfers to state Et in one step, and Ms indicates the sample numbers of 

state Es before transfer. Then we get p^J.^ = — — , where p^J.^ means the transition probability of 
shifting from state Es to state Et by one step. Therefore the 5 x 5 transition probability matrix is: 



p(i) ^ 



For any component I X^^ ^'^^ I (/ = 1, 2, . . . m), the probability distribution vector is: 

n(D-j) = (7r,{D-j),7r,{D-j),7r,{D-j),7r,{D-j),7r,{D-j^^ (8) 

Assume that I x^^'^'^'^ I is in state E3, then the probability distribution vector of it is 
;r (Z> - 2) = (0, 0, 1, 0, 0) . As the probability distribution vector ;r (Z) - 2) and the transition probability 
matrix P^^^ are known, then the probability distribution vector of I x^^~^'^'^ I is 
;r(Z)-l) = ;r(£>-2)xP^^\ the corresponding state in max{7t^{D-\), ^ g{1,2,3,4,5}} is the state 
I X^^-''^ I belongs to. If I X^^'''^ I is in state s, then the specific value of I x\^-''^ I is determined 
as follows: 
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7r,_,{D-l) + 7r^{D-l) + 7r^,,{D-l) 



min^ + max^ [D - 1) 

^ 2 '';r,_,(Z)-l) + ;r,(Z)-l) + ;r,,,(Z)-l) 



n,_,{D-\) + n,{D-\) + 7t,^,{D-\) 

Continue to introduce Markov decision processes to predict the signs (positive and negative) of 
I X\^-''^ I . For tlie vector X[ = X^'^') , we define that state Ej corresponds to 

positive, and state E2 corresponds to negative. Then we get the transition probabihty p[f^ = 



the transition probabihty matrix P 



which reflects the probabihty of transferences 



^Pii P22 J 

between positive and negative. Also for any component I xf"^"^'^^ I, (j = 1, 2,...,mJ, the probability 
distribution vector is n' {D -7 ) = {7r[ [D- J^.tt'^ {D- j) . Assume that I x^^~^'^^ I is a positive, then 
the probability distribution vector of it is ;r'(/)-2) = (1,0) . As the probability distribution vector 

n(^D-2) and the transition probability matrix P'^^^ are known, then the probability distribution 
vector of the sign of I is 7r\D-l) = 7T'{D-2)xP'^^\ the corresponding state in 

rnaxi ttI (D-lXsG {1, 2} } indicates the sign of I X^^-''^ I . 

3.2.4. Information Entropy Based Evidence Confusion 

As the Un nodes are both in uncertainty status, we need to find a mechanism to determine the status of 
these nodes. Dempster-Shafer evidence theory is an effective method for dealing with uncertainty 
problems, but the results obtained are counterintuitive when the evidences conflict highly with each 
other [18,19]. 

In the improved evidence fusion algorithm we propose, the possible events can be depicted as 
evidences. Through combination rules, evidences are aggregated into a comprehensive belief probability 
assignment under uncertainty conditions. It's assumed that a set of hypotheses about node status is 
denoted as frame of discernment 0 = {G, F} . The symbol G represents a good sensor, and F is faulty. 

The power set 2® includes all of subsets of 0. Here 2® ={{cI)},{G},{F},{T}} , each symbol of 
which respectively represents the hypotheses about impossible, good, faulty, and uncertainty. 
The belief probability assignment (BPA) functions of node / are depicted as follows: 

m:2^^[0,l] (10) 
m.(O) = 0 (11) 

We define the BPA function for good status is: 
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m({G}) = 







l-e 



(ix-"ir 

2^? 



2 

0.5- 
0 



1^ 
4m 



Similarly, the BPA function for faulty status is: 



mi{F}) = 



4m 



l-e 



2at 



0.5 ^ 

^x 



1 

The BPA function for uncertainty status is: 



m(m) = 



^x >U,+(T, 

^x ^ 

0 < < Mj - CTj 

>Wi+o-i 
<"i-o-i 



(12) 



(13) 



0.5- 



4m 



^1-^1 ^^i+CTi (14) 

Here we design an expectation deviation function . It's assumed that the measurement value of 
nodes at time t on day D is a random variable, which has the expectation EX and variance cr^ . Define 



0 



^x='- 



X-EX 



that means the multiple relation between and the difference between X and EX. 



indicates the data offset between node / and the average of good neighbors. The larger is, the 
more probable that the node is faulty. With the increase of , m({G}) reduces, on the contrary, 
m({F}) rises. 

In Section 3.1, we have discussed that one of the defects of traditional algorithms is that an 
indeterminacy occurs for the ' condition in Equation (4), and thus the node is not reducible to good or 
faulty. Therefore, we define the range (jU^-o-^,jU^+cr^) within which the status of this node has higher 

uncertainty (the probability of this node being fault is moderate) and mC¥) - N(jUy,o-^) when 
G(ju^-a^,jU^+a^) . When =Mi^ = which means the uncertainty reaches the top 

(depicted in the Figure 2). The definitions of m(G), m(F) and mQ¥) express this meaning above and 
provide a good description of the influence of changing on evidence. 
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In Equation (14), when G(ju^-a^,jU^+a^) can see that: 

P(l>M,+C7,)<e-"' 

According to the Chebyshev inequahty: 

p{|X-£X|>s}<a'/s' 

Make s^=e^^^cr^ then s = ^^^"^a. The formula expands as follows: 



(15) 



(16) 



(17) 



Figure 2. The BPA functions. 
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According to Equations (15) and (16), we get jU^+a^=e~^^'^ . Here, we define aj = 0.1, 
and then juj = 0.68. After all above, m({G}), m({F}) and m(^) can be calculated by 
Equations (12)-(14), respectively. 

In D-S evidence theory, if there are more than two BPAs that need to be combined, then the 
combination rule is defined as follows: 



m 



(18) 



l-K 



where K is the mass that is assigned to the empty set O, and K = ^^n p^^m.(5.). But the 

I l/=i '~ '~ 

traditional Dempster-Shafer evidence has a very obvious disadvantage when being used in our algorithm 
of fault detection. For example: 

ml: m(G) = 0.8, m(F) = 0.2, m(^) =0, 
m2: m(G) = 0.8, m(F) = 0.2, m(^) =0. 

The fused result is m(G) = 0.94, m(F) = 0.06, m(^) = 0. However, the result extremely negates F in 
the traditional Dempster-Shafer evidence fusion. Obliterating conflict roughly and running 
normalization processes leads to extreme differences between G and F, This will cause errors in the 
judgment of sensors' states when using uDFD. That is because the node / will find the node j which 
matches the minPm^({G}-m*({G}))^ . Too extreme evidence will influence the effect of 
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comprehensive evidence. Based on this, we propose a new evidence fusion rule combined with 
information entropy theory. According to conflicts to the entirety presented by information divergences, 
we classify evidences into several sets. By fusing the results from different sets, this prevents extreme 
extension of differences between G and F. By this evidence fusion algorithm, we can finally determine 
the nodes' status. 

In classical theories of information. Shannon Entropy measures the amount of information, while the 
amount of information reflects the uncertainty in random events. Considering different evidences should 
be assigned different fusion weight according to its amount of information, so, in this section, the 
theories of entropy and the degree of disagreement function which measures the information 
discrepancy are introduced into combination rules for evidence conflicts and increase the accuracy of 
fault determination for Un nodes. Firstly, we introduce some definitions. The information divergence 
D{p I \q) between discrete random variables p and q is defined as below [20]: 

D{p\\q) = Yj^{x)log^ (19) 

It is obvious that D{p 1 1^) > 0 assume that Mi indicates the 1th evidence and = (m^^ ,m^i,,,,,m^i) , 
where ntu is called a focal element. Here, ^ m.^ = 1, m.^ > 0, 0 < / < /i, 1 < / < 5 , where n is the number of 

focal elements in each evidence and s indicates the amount of evidences. 
Di is defined as follows: 

A =-fp{M, IIM.) = iX&/ln^ (20) 

^ 7=1 ^ 7=1 i=l ^ij 

It indicates the degree of differences between M/ and the whole evidences. It is determined by the 
average of the information divergence between M/ and each evidence. After this, define di as the 
percentage of the whole difference degree that Mi occupies. It is calculated as follows: 

^/=A/ZA (21) 

According to Si, evidences are going to be classified into several subsets. Evidences which have 
similar Si are aggregated in the same subset. Before classification, the demarcation point A is confirmed 
as below, which means the average differences between Sf. 

^ = ^^Xt\^'-^i\ (22) 



s(s-l) 



1=1 i=l+l 



Assume that Cr is the resultant subset and P is the collection of Si. The pseudo code of the 
classification algorithm is as follows: 

l:r = 0; 

2: While P is not empty, do 

3: Randomly, choose any one element from P and put it into Cr. Remark this element as Cru 
4: Remove Cri from P; 
5: Loopl, for / = 1 to s 

6: Loop2, for; = 1 to ICJ (ICJ is the cardinality of ICJ) 
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7: If - I > A , then continue loopl ; 

8: End if; 
9: End loop2; 

10: Put Si into Cr and remove it from P; 

11: End loopl; 

12: r++; 

13: End while. 

After classification, the difference between Si of each evidence in the same subset is less than A, that 
is, evidences have a smaller extent of conflict in each subset. Assume that there are m subsets and the 
weight of each subset is defined as CW^ =\C^\/ s , where \C^\ is the number of evidences in each subset. 

Evidences in each subset will be fused with weights to get the aggregative center of Cr, and the 
weighting fusion is based on information entropy. 

Information entropy indicates the amount of information an evidence has. The larger the information 
entropy is, the less amount of information the evidence has. Define that the information entropy of 
evidence - {n\i , . . . , m^^ ) is calculated as follows [21]: 

n 

Hi=YPji^^'^Ji (23) 

7=1 

If 0 is a focal element in M/, then a larger (0) means M/ has less amount of information, so the 
amount of information of M/ can be calculated according to the following formula: 

= [1 - (0)] X /(//,) = [1 - (0)] X e-"' (24) 

A smaller weight will be assigned to an evidence which has less amount of information, so the weight 
allocation of each evidence is as follows: 



.,c,,, 

i/|ci,ifv/,y,=o 



VZljy/''/ where V, ^0 ^^^^ 



Then we can get the aggregative center of Cr by using an improved D-S formula. For any focal 
element "A", the result of fusion is ( A) = p ( A) + Kq(A) , where: 

p{A)= nm,(A) (26) 

|c,-| 

q{A) = J}V,xm,(A) (27) 

1=1 

K = A (28) 

Here, p(A) represents the traditional way to fuse evidences and q(A) represents the average support 
degrees from each evidence to A. When K is large enough, the influence from q(A) is increased. 
Assume that there are m subsets and rir is the aggregative center of Cr, then the final result of fusion is: 

H^) = (XL (CW^ X (A))) /(2^m(A)) (29) 
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4. Simulation Analysis 

4.1. Simulation Setting 

We use the MATLAB simulation tool to demonstrate our model. As shown in Figure 3, a square 
with a side length of 100 m is constructed in our model, in which sensors are deployed and form the 
network. Ten temperature sources are deployed in the square as the sensing objects of sensors. The 
distance between two sources is no less than L. Every temperature source randomly generates 
temperature data x which ranges from -5 to 40 °C. These readings simulate the temperature variation of 
four seasons, which means it has regularity and smoothness. Second, n sensors are deployed in this 
square and each of them selects the nearest temperature source which must be in the sensing range. If no 
temperature source exists within sensing range, a sensor is set to not work, which means no sensing from 
a temperature source and no communication with neighbor nodes. 

Figure 3. Topology description. 
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Each working node establishes its variation of sensed data according to the distance to its temperature 
source, which can be described by the formulas below: 

X^=x + d/lO (30) 

X^,^=x-d/lO (31) 

and X^.^ are the upper and lower bounds of the data range, respectively, x is the temperature 
generated by the temperature sources and it is uniformly distributed in ( X^.^ , X^^^). d is the distance 

between a sensor and its temperature source. In every sensing moment, a sensor chooses a random value 
between X^^^ and X^.^ as its sensing data. 

Each sensor node chooses other nodes which are within its communication range (communication 
radius is represented by R) and have the same temperature source as its neighbor nodes. After this, each 
node creates a set of neighbor nodes and the wireless sensor network is formed. 

Two cases of uniformly distributed fault nodes and intensively distributed fault nodes are simulated. 
The first case is used in comparison when the number of nodes ranges. In the second case, we set squares 
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which are located at a random coordinate as a fault region. We compare the detection effects for different 
scales of intensive faults by changing the area of a square. According to the sensing data designation, 
data ranging from X^.^ to are treated as good, otherwise, data are treated as faulty. Fault data are 

set to +5d I r or X^.^ -5d / r . Parameters are initialized: L = 30 m, r = 20 m, /? = 20 m. In our 
simulation, each final data result is the average of results from 30 repeats. 

4.2. Simulation Result Analysis 

4.2.1. Effect of Data loss 

At each moment of the data collection, some nodes are chosen to be unable to sense data to simulate a 
data loss scenario. The Data Missing Preprocess Mechanism proposed in this paper is compared with the 
Data Filling method based on the fc-Nearest Neighbor algorithm (df-KNN) algorithm. The main idea of 
df-KNN is to select k nodes from neighborhood which have the shortest distances, weigh the data of the 
k nodes according to these distances and finally sum the data as the interpolation result. Here, the data 
loss rate is set to 5%, 10%, 15%, 20%, 25%, 30% and 35%, respectively. 

Figure 4. Effects of data filling. 
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As shown in Figure 4, data loss rate is set as the horizontal ordinate, which means the ratio of the 
number of the data loss nodes to the sum of working nodes. The mean residual is set as the vertical 
ordinate, which means the average of differences between interpolation data and pre-established data 
and it reflects the final accuracy of the algorithms. Mean residual grows as the loss rate grows. When the 
loss rate is lower, the mean residual of uDFD is 0.1 lower than that of KNN, and achieves an 
unremarkable improvement, but as the loss rate grows higher, the improvement turns to be higher. 
Approximately, when the loss rate is high enough, the improvement is 0.5, which means uDFD is more 
suitable in the large-scale data loss situation. With the growth of data loss rate, the number of neighbors 
which have available data reduces, which means less information could be collected and eventually this 
makes the interpolation results unreliable. In comparison with df-KNN, uDFD adequately involves the 
historical data of neighbors to predict and solve the problem of credit reduction due to less available 
data, which leads to better results. 
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4.2.2. Evidence Fusion 

In this paper, information theory-based evidence reasoning is used to fuse collected evidences 
before the status judgment of nodes. Original D-S evidence reasoning and an improved one proposed by 
Qiang Ma et al. [22] are used for comparison. The improved D-S is depicted as below. 

Define the distance between evidence mi and m2'. 

d[n\,mj^ = ^l/2(m^ -m^^ {n\ -m^) (32) 

Define the similarity of my and m2'. 

S(m^,m^) = l-d(m^,m^) (33) 

Define the basic credit of mf. 

A^Zi.,.^,,v.^(^'^2) (34) 



Here N is the sum of evidences. 
Define the weight of m/. 



Amend all evidences: 



^^.=A./max^. (35) 



ml {A) = (p,m,(A) (36) 
ml {y/) = (p,mXv^) + (l-(Pt) (37) 

A is the established focal element and i// is the uncertain one. Fuse the amended evidences through the 
original D-S. The algorithm above measures the degree of conflicts among evidences by involving 
distance and amends evidences before fusion. The simulation results are shown in Table 1. 

Belief function and plausibility function are involved to estimate the fusion results. BPA-based belief 
function in the frame of discernment 0 is defined as: 

Bcl{A)= Y,m(B) (38) 

BPA-based plausibility function in the frame of discernment 0 is defined as: 

P1(A)= Y^miB) .39. 

Belief interval is defined as [Bel(A), P/(A)], which is shown in Figure 5. The hypothesis that A is true 
is accepted in [0, Bel(A), is uncertain in [Bel(A), Pl(A)] and is refused in P/(A), 1]. The length of interval 
presents the possibility to make a corresponding conclusion to this hypothesis. 

Figure 5. Belief interval. 
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Table 1. Results of evidence fusion. 



\. Evidences 

Algorithms\ 

^ 


Evidence Set 1 mi:m(G) = 0.1, 
m(F) = 0.2, m(i|/) = 0.7 
m2:m(G) = 0.2, m(F) = 0.2, 
m(\|/) = 0.6 


l^viaence ^et 2 mi:m((j) = U.l, 
m(F) = 0.2, m(\|/) = 0.7 
m2:m(G) = 0.2, m(F) = 0.2, 
m(\|/) = 0.6 m3:m(G) = 0.1, 
m(F) = 0.1, m(\|/) = 0.8 


iLvidence c^et 2 mi:m((j) = U.l, 
m(F) = 0.2, m(v|/) = 0.7 
m2:m(G) = 0.2, m(F) = 0.2, 
m(\|/) = 0.6 m3:m(G) = 0.6, 
m(F) = 0.2, m(\|/) = 0.2 




m(Vjj = U.2J4, 


w^{f^\ — A ^TA'l 

m(Cjj = U.27UJ, 


m(ljj = U.5V7o, 


Urginal D-b 


ni(r< ) = U.Jiyi, 


ni(r ) = U.J514, 


m(r< ) = U.2o4y, 




m(\|/) = U.4400 


m(\|/) = U.J7o4 


-^/-,,\ All 
m(\|/) = U.117J 


Improved D-S 
by Qiang Ma 


m(G) = 0.234, 
m(F) = 0.3191, 
m(\|/) = 0.4468 


m(G) = 0.2561, 
m(F) = 0.3388, 
m(\|/) = 0.2272 


m(G) = 0.2881, 
m(F) = 0.2909, 
m(\|/) = 0.4210 




m(G) = 0.2271, 


m(G) = 0.1956, 


m(G) = 0.3384, 


uDFD 


m(F) = 0.3088, 


m(F) = 0.2272, 


m(F) = 0.2679, 




m(\|/) = 0.4641 


m(\|/) = 0.5772 


m(\|/) = 0.3972 



The belief functions and plausibility functions are shown in Table 2 according to the fusion results. 
Table 2. Belief functions and plausibility functions of fusion results. 



Evidence Set 1 



Evidence Set 2 



Evidence Set 3 



Original 
D-S 



Bel(G) = 0.234, P1(G) = 6808 
Bel(F) = 0.3191, P1(F) = 0.7659 
Bel(\|/) = 0.4468, Pl(v|/) = 1 



Bel(G) = 0.2703, P1(G) = 0.6486 
Bel(F) = 0.3514, P1(F) = 0.7291 
Bel(\|/) = 0.3784, Pl(v|/) = 1 



Bel(G) = 0.5978, P1(G) = 0.7151 
Bel(F) = 0.2849, P1(F) = 0.4022 
Bel(\|/) = 0.1173, Pl(v|/) = 1 



Improved Bel(G) = 0.234, P1(G) = 6808 
D-S by Bel(F) = 0.3191, P1(F) = 0.7659 
Qiang Ma Bel(\|/) = 0.4468, Pl(\|/) = 1 



Bel(G) = 0.2561, P1(G) = 0.6612 
Bel(F) = 0.3388, P1(F) = 0.7439 
Bel(\|/) = 0.4051, Pl(\|/) = 1 



Bel(G) = 0.2881, P1(G) = 0.7091 
Bel(F) = 0.2909, P1(F) = 0.7119 
Bel(\|/) = 0.4210, Pl(\|/) = 1 



Bel(G) = 0.2271, P1(G) = 0.6912 
uDFD Bel(F) = 0.3088, P1(F) = 0.7729 
Bel(\|/) = 0.4641, Pl(i|/) = 1 



Bel(G) = 0.1956, P1(G) = 0.7728 
Bel(F) = 0.2272, P1(F) = 0.8044 
Bel(\|/) = 0.3784, Pl(i|/) = 1 



Bel(G) = 0.3348, P1(G) = 0.7321 
Bel(F) = 0.2679, P1(F) = 0.6652 
Bel(\|/) = 0.3972, Pl(i|/) = 1 



The results of these algorithms are similar to each other when two evidences with low degree of 
conflict in evidence set 1 are to be fused, among which the results of the original D-S and the improved 
D-S proposed by Qiang Ma [22] are the same. A similar evidence is added to set 1 to form set 2. Through 
the analysis of belief functions and plausibility functions of three algorithms, the possibility of original 
D-S to accept that G is true is 0.2703 and the possibility to refuse is 0.3514. The possibility of improved 
D-S to accept that G is true is 0.2561 and the possibility to refuse is 0.3388. The possibility of uDFD to 
accept that G is true is 0.1956 and the possibility to refuse is 0.2272. According to evidence set 2, the 
possibilities of the previous two algorithms to accept and refuse that G is true is too high to reflect the 
actual situation (the possibilities to accept and refuse are both lower than 0.2) of each evidence. 
However, the algorithm proposed in this paper is closer. In uDFD, the possibilities to accept and refuse 
are not raised by reducing the uncertainty, which makes the fusion result more credible. A completely 
different evidence is added to set 1 to form set 3 with high degree of conflict. It is obvious that the 
possibility (0.5978) of original D-S to accept that G is true is so high that approaches the one of the last 
evidence which is added in set 3 and the possibility (0.2881) of improved D-S is too low to reflect the 
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affect caused by high degree of conflict. The result of uDFD is between the ones of the previous two 
algorithms, which balances the influences of all evidences and is more credible. 

4.2.3. Detection Accuracy 

DFD, IDFD and uDFD are compared based on the constructed wireless sensor network model. 
Measures to be involved are detection accuracy (the ratio of number of correctly detected nodes to the 
sum of working nodes), false alarm rate (the ratio of number of nodes which are misjudged from good to 
false to the sum of working nodes), missing alarm rate (the ratio of number of nodes which are 
misjudged from false to good to the sum of working nodes). In this simulation, each final data point is 
the average of results from 30 repeats. 

First, we analyze the effects of these three algorithms with the changing fault rate when nodes are 
randomly distributed uniformly. Considering that different influences are caused by different 
distribution densities, cases with 40, 80 and 120 working nodes are simulated. 

Figure 6 shows the detection effects of the three algorithms with 40 working nodes. In this case, 
nodes are distributed sparsely in the simulation region. It can be seen from the figure that with the 
increasing fault rate, detection accuracy shows an approximate linear downward trend; however, false 
alarm rate and missing alarm rate show the opposite trend. Through further calculation, when the fault 
rate ranges from 5% to 50%, average detection accuracy of uDFD is 9.33% points higher than that of 
DFD and 6.25% points higher than that of IDFD; average fault alarm rate of uDFD is 6.93% points lower 
than that of DFD and 5.33% points lower than that of IDFD; average missing alarm rate of uDFD is 
2.49% points lower than that of DFD and 1.02% points lower than that of IDFD. The uDFD brings better 
detection accuracy under sparse distribution conditions. 

Figure 6. Detection effects under 40 working node conditions. 
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Figure 7 shows the detection effects of the three algorithms with 80 working nodes. In this case, 
nodes are distributed moderately densely in the simulation region. As is shown by the figure, with the 
increasing fault rate, detection accuracy shows an approximately linear downward trend; however, false 
alarm rate and missing alarm rate show the opposite trend. Through further calculation, when the fault 
rate ranges from 5% to 50%, average detection accuracy of uDFD is 7.31% points higher than that of 
DFD and 5.44% points higher than that of IDFD; average fault alarm rate of uDFD is 4.21% points lower 
than that of DFD and 3.4% points lower than that of IDFD; average missing alarm rate of uDFD is 
3.22% points lower than that of DFD and 2.04% points lower than that of IDFD. The uDFD provides 
better detection accuracy in this condition. Meanwhile, as the fault rate grows, the superiority of the 
detection effect of uDFD continues to increase. When the fault rate is 50%, the detection accuracy of 
uDFD is 10.08% points higher than that of DFD and 7.68% points higher than that of IDFD, which 
indicates that uDFD adapts better to high fault rate conditions. 

Figure 7. Detection effects under 80 working node conditions. 
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Figure 8 shows the detection effects of the three algorithms with 120 working nodes. In this case, 
nodes are distributed densely in the simulation region. It can be seen from the figure that with the 
increasing fault rate, detection accuracy shows an approximately linear downward trend; however, false 
alarm rate and missing alarm rate show the opposite trend. Through further calculation, when the fault 
rate ranges from 5% to 50%, the average detection accuracy of uDFD is 3.75% points higher than that of 
DFD and 1 .74% points higher than that of IDFD; average fault alarm rate of uDFD is 1 .69% points lower 
than that of DFD and 1.14% points lower than that of IDFD; average missing alarm rate of uDFD is 
2.09% points lower than that of DFD and 0.6% points lower than that of IDFD. The uDFD provides 
better detection accuracy under dense distribution conditions. Furthermore, as the fault rate grows, the 
superiority of the detection effect of uDFD increases continually, which indicates that uDFD performs 
better under high fault rate conditions. 
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Figure 8. Detection effects under 120 working node conditions. 




Through comprehensive analysis of Figures 6-8, we can see that detection accuracy of these 
algorithms increases with the increasing distribution density of nodes. This is due to the increase of 
available information when judging resulting from the growing number of neighbor nodes. By 
comparison, the advantage of detection accuracy of uDFD increased as the distribution density of nodes 
decreases, which indicates that the detection accuracy of uDFD improves more than that of DFD and 
IDFD. Thanks to evidence fusion based on the status of neighbor nodes before judgment, uDFD works 
better under small number of neighbor node conditions. 

Second, we analyze the detection accuracy of the three algorithms when fault nodes are intensively 
distributed. The intensive distribution scheme involves setting squares located at random coordinates 
with length of 20, 25, 30, 35 and 40 m as the fault regions. In the fault region, all nodes are set to fault. 
By changing the size of the ault region, we can observe the detection accuracy for different scales of 
faulty nodes. Here, the number of working nodes is 80. 

Figure 9 shows detection effects of the three algorithms with different fault region sizes. When the 
fault rate ranges from 5% to 50%, the average detection accuracy of uDFD is 2.08% points higher than 
that of DFD and 1.55% points higher than that of IDFD; average fault alarm rate of uDFD is 0.84% 
points lower than that of DFD and 0.45% points lower than that of IDFD; average missing alarm rate of 
uDFD is 1.24% points lower than that of DFD and 1.09% points lower than that of IDFD. It is easy to 
conclude that uDFD can achieve better detection effects when intensive faults occur. The uDFD takes in 
more information from the neighborhood to judge when dealing with intensive faults situations, which 
reduces the influence from mutual cheating among faulty nodes. 

In uDFD, 6i is the threshold of data from different nodes on same moment, 62 is the threshold of data 
from the same node on different moments and 63 is the threshold of data of the same node collected on 



Sensors 2014, 14 



1611 



the same moment of different days. In our model, a value of 62 ranging within the interval (0, 2) has little 
influence on detection effect, so it is set to a fixed value of 1. Based on those values, the best 
combination of 9i and 63 is going to be explored. 

The selection of 61 and 63 is directly related to the judgment results, thus, to explore the best 
combination of 61 and 63 becomes the key to explore the best detection effect of uDFD. Figure 10 shows 
a 3D map of detection accuracy with different combinations of 61 and 63. Here, the number of working 
nodes is 80. It can be seen from the curved surface that the combination of 9i = 5, 63= 5.4 approaches 
the peak, which means the maximum detection accuracy is 0.98. 

Figure 9. Detection effects under intensive fault conditions. 
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Figure 10. Detection accuracy with different combinations of 61 and 63. 
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4.2.4. Communication Energy Consumption 

In our model, conmiunication between nodes is simulated by using the ZigBee protocol. ZigBee, a 
personal area network protocol based on IEEE802.15.4, supports short-distance, low-complexity, 
self-organizing, low-power, high-speed and low-cost wireless conmiunication technology and applies 
well in WSNs. The brief frame structure of ZigBee is shown in Figure 1 1 . The frame head is constructed 
by the bits from the application layer, network layer, MAC layer and physical layer. In our model, 
messages transmitted between nodes include prejudged statuses, evidences and final judged statuses. 
The information above can be encapsulated in the payload field of the application layer frame by 
analyzing the frame structure of ZigBee. When statuses are to be transmitted, the payload is only 4 bits 
(2 bits present message type and 2 bits present status, namely LGIG) and the total length of a frame is 
47 bits (the header length is 43 bits). 

Figure 11. Brief frame structure of ZigBee. 
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Radio energy dissipation model is calculated as indicated below: 



Kiec+l^fsd\'^<do (40) 

EU^) = E^-elec{i) = iE,lec (41) 



Ej,^{l,d^ is the transmission energy, which includes electronic energy Ej.^_^i^^{l) and amplifier 
energy E^^_^^p {l,d^. I is the length of a message. E^^^^ is determined by digital coding, modulation 

2 4 

and filtering is fixed as 50 nJ/bit. <fy^ =10 pJ/bit/m and e^^ = 0.0013 pJ/bit/m . d is the transmission 

distance, do is the threshold of d and is set to 20 m. If d is larger than do, energy dissipation is mainly 
caused by free space power loss (d^). Otherwise, energy dissipation is mainly caused by multipath power 
loss (d'). 

First, we analyze the number of messages transmitted in the simulated network under 40 working 
node conditions, which is depicted in Figure 12. When transmitting messages, nodes exchange statuses 
by radio broadcasting. An accumulated number of messages is recorded in the process of 30 tests. With 
the growth of test rounds, the number of messages shows an approximately linear upward trend. 
Through further calculation, approximate slopes of DFD, IDFD and uDFD are 266.7, 180.0 and 103.3, 
respectively. Obviously, the rate of increase of uDFD is the lowest, which means a minimum of 
messages transmitted during the fault detection. This is because the nodes in DFD and IDFD have 
to exchange all prejudged states and final judged statuses whether they are good or faulty, which 
leads to more interactions, while uDFD only exchanges message if the tendency status is LG or final 
status is good. 
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The comparison of average communication energy consumption of each node after 30 tests is shown 
in Figure 13. Average energy consumptions of all nodes in DFD, IDFD and uDFD are 0.235, 0.176 and 
0.082 mJ. By counting and comparing the energy consumption of each node specifically, we find that 
uDFD has the best energy saving performance, which is caused by the reduction of interactions. 



Figure 12. Number of messages accumulated during 30 tests under 40 working node conditions. 
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Figure 13. Energy consumption of each node under 40 working node conditions. 
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Figures 14-16 show the energy consumption of each detection under conditions of 40, 80 and 
120 working nodes. Through further calculation, under 40 working node conditions, the average 
energy consumption of all detections in uDFD is 2.87 mJ lower than that of IDFD and 5.47 mJ 
lower than that of DFD. Under 80 working node conditions, the average energy consumption of all 
detections in uDFD is 4.77 mJ lower than that of IDFD and 10.6 mJ lower than that of DFD. 



Sensors 2014, 14 

Figure 14. Energy consumption of each detection under 40 working node conditions. 
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Figure 15. Energy consumption of each detection under 80 working node conditions. 
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Figure 16. Energy consumption of each detection under 120 working node conditions. 
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Under 120 working node conditions the average energy consumption of all detections in uDFD is 
9.26 mJ lower than that of IDFD and 19.59 mJ lower than that of DFD. The uDFD has the best energy 
saving performance during detection. Compared with DFD and IDFD, uDFD has less iteration and no 
need to transmit Un status, which reduces interactions and decreases energy consumption. Messages 
carrying evidences use more bits than those carrying status, but this disadvantage has little adverse 
influence on the overall performance of uDFD. 
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We achieve good performances which are shown in simulation above. As traditional DFD and 
IDFD require that each node broadcast its status, we reduce the communication overload by 
broadcasting the status of nodes which are determined as good. What's more, uDFD displays higher 
detection accuracy in a high data loss rate environment. 

5. Conclusions 

In the paper, we propose a fault detection mechanism for wireless sensor networks based on data 
filling and evidence fusion methods. Aiming at decreasing of detection accuracy due to data losses, the 
uDFD mechanism is demonstrated to be more suitable in the large-scale data loss situation. What's 
more, information entropy theory-based evidence reasoning is used to fuse collected evidences before 
the status judgment of nodes. This helps balance the influences of all evidences and make them more 
credible. Our algorithm can retain higher detection accuracy regardless of lower connectivity 
environment or changing fault ratios. The design that only sensors determined as good require 
exchanging states for evidence fusion decreases the number of messages broadcast in the process of 
fault detection. Avoiding too many message exchanges for fault detection will reduce a huge burden 
on the limited energy of sensors. In the future, we will solve the phenomenon that the detection 
accuracy is less than 80% when the fault ratio is closer to 0.5. For example, we will consider historical 
judgment behaviors to reason and increase detection accuracy, as well as the cross-impact of more 
types of faults. 
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